Systems and methods for generating and deploying machine learning applications

ABSTRACT

A method comprising receiving data associated with a business, the data comprising first values for first attributes; processing the data, in accordance with a common data attribute schema that indicates second attributes, to generate second values for at least some of the second attributes including a group of attributes, the second values including a group of attribute values for the group of attributes; identifying, using the common data attribute schema and from among pre-existing software codes, software code implementing an ML data processing pipeline configured to generate a group of feature values; processing the group of attribute values with the software code to obtain the group of feature values; and either providing the group of feature values as inputs to a machine learning (ML) model for generating corresponding ML model outputs, or using the group of feature values to train the ML model.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority, under 35 U.S.C. § 119,to U.S. Provisional Patent Application Ser. No. 63/227,975, filed onJul. 30, 2021, titled “Systems and Methods for Generating and DeployingMachine Learning Applications”, which is incorporated by referenceherein in its entirety.

BACKGROUND

Machine learning models are widely applied to multiple different typesof problems in multiple different applications. A machine learning modelcontains multiple parameters. Prior to being applied to a particularproblem, a machine learning model is trained by using training data toestimate values of its parameters. The resulting trained machinelearning model may be applied to input data to produce correspondingoutputs.

SUMMARY

Some embodiments provide for a method for using virtualized machinelearning (ML) application programs associated with different ML tasks,the method comprising: using at least one computer hardware processor,which is configured to execute virtualized application programs, toperform: loading a virtualized ML application program comprising aplurality of software modules, the plurality of software modulesincluding: a first software module configured to apply one or more MLdata processing pipelines to received data to generate input data forproviding as input to an ML model associated with a respective ML task;and a second software module configured to perform inference using theML model; using the first software module to: (a) receive data, via atleast one communication network, from a source external to thevirtualized ML application program; and (b) process the received datausing the one or more ML data processing pipelines to generate the inputdata; and using the second software module to apply the ML model to theinput data generated using the first software module to producecorresponding ML model output.

Some embodiments provide for a system, comprising: at least one computerhardware processor configured to execute virtualized applicationprograms; and at least one non-transitory computer-readable storagemedium storing processor executable instructions that, when executed bythe at least one computer hardware processor, cause the at least onecomputer hardware processor to perform a method for using virtualizedmachine learning (ML) application programs associated with different MLtasks, the method comprising: loading a virtualized ML applicationprogram comprising a plurality of software modules, the plurality ofsoftware modules including: a first software module configured to applyone or more ML data processing pipelines to received data to generateinput data for providing as input to an ML model associated with arespective ML task; and a second software module configured to performinference using the ML model; using the first software module to: (a)receive data, via at least one communication network, from a sourceexternal to the virtualized ML application program; and (b) process thereceived data using the one or more ML data processing pipelines togenerate the input data; and using the second software module to applythe ML model to the input data generated using the first software moduleto produce corresponding ML model output.

Some embodiments provide for at least one non-transitorycomputer-readable storage medium storing processor executableinstructions that, when executed by at least one computer hardwareprocessor configured to execute virtualized application programs, causethe at least one computer hardware processor to perform a method forusing virtualized machine learning (ML) application programs associatedwith different ML tasks, the method comprising: loading a virtualized MLapplication program comprising a plurality of software modules, theplurality of software modules including: a first software moduleconfigured to apply one or more ML data processing pipelines to receiveddata to generate input data for providing as input to an ML modelassociated with a respective ML task; and a second software moduleconfigured to perform inference using the ML model; using the firstsoftware module to: (a) receive data, via at least one communicationnetwork, from a source external to the virtualized ML applicationprogram; and (b) process the received data using the one or more ML dataprocessing pipelines to generate the input data; and using the secondsoftware module to apply the ML model to the input data generated usingthe first software module to produce corresponding ML model output.

In some embodiments, the plurality of software modules further comprisesa third software module configured to train the ML model, the methodfurther comprising: using the third software module to train the MLmodel using the input data.

In some embodiments, the input data comprises training data having aplurality of training inputs and a corresponding plurality ofground-truth outputs, the ML model comprises a plurality of parameters,and using the third software module to train the ML model comprisesestimating values of at least some of the plurality of parameters usingthe training data.

In some embodiments, the estimating is performed using one or moresoftware libraries part of the third software module in the virtualizedML application program.

In some embodiments, the plurality of software modules further comprisesa fourth software module configured to generate information explainingperformance of the ML model, the method further comprising: using thefourth software module to generate information explaining performance ofthe ML model on the input data.

In some embodiments, the input data comprises a plurality of values fora respective plurality of features; and the information explainingperformance of the ML model on the input data indicates, for at leastsome of the plurality of features, relative degrees to which the atleast some of the plurality of features influenced the ML model output.

In some embodiments, the input data comprises a plurality of values fora respective plurality of features; and the information explainingperformance of the ML model on the input data indicates, for each of atleast some of the plurality of features, a sensitivity of the ML modeloutput to changes in a value of the feature.

In some embodiments, the ML model comprises a multi-layer neural networkconfigured to detect objects in images, the input data comprises aninput image, and the information explaining performance of the ML modelon the input data comprises information explaining performance of themulti-layer neural network on the input image.

In some embodiments, the information explaining the performance of themulti-layer neural network on the input images comprises: informationindicating, for at least some pixels in the input image, relativedegrees to which the at least some of the pixels influenced the ML modeloutput; and information indicating, for each of the at least some of thepixels, a sensitivity of the ML model output to changes in a value ofthe pixel.

In some embodiments, the received data comprises attribute values for aplurality of attributes, the input data comprises feature values for aplurality of features, and the one or more ML data processing pipelinesare configured to process the received data by: applying one or moredata cleansing procedures to at least some of the attribute values toobtain cleansed attribute values; and applying one or more featureextraction procedures to the cleansed attribute values to obtain thefeature values.

In some embodiments, the plurality of attributes comprises groups ofattributes including a first group of attributes and a second group ofattributes, the attribute values comprise groups of attribute valuesincluding a first group of attribute values for the first group ofattributes and a second group of attribute values for the second groupof attributes, the plurality of features comprises groups of featuresincluding a first group of features and a second group of features, thefeature values comprise groups of feature values including a first groupof feature values for the first group of features and a second group offeature values for the second group of features, and the one or more MLdata processing pipelines comprise: a first ML data processing pipelineto generate the first group of feature values from the first group ofattribute values using first data cleansing procedures and first featureextraction procedures; a second ML data processing pipeline to generatethe second group of feature values from the second group of attributevalues using second data cleansing procedures different from the firstdata cleansing procedures and second feature extraction proceduresdifferent from the first feature extraction procedures.

In some embodiments, the source external to the virtualized MLapplication program comprises a data store part of a computing systemthat does not execute the virtualized ML application program.

In some embodiments, the method further comprises providing, via the atleast one communication network, the ML model output to the sourceexternal to the virtualized ML application program.

In some embodiments, the first software module comprisesprocessor-executable instructions that, when executed by the at leastone computer hardware processor, cause the at least one computerhardware processor to (a) receive the data, via the at least onecommunication network, from the source external to the virtualized MLapplication program; and (b) process the received data using the one ormore ML data processing pipelines to generate the input data.

In some embodiments, the second software module comprisesprocessor-executable instructions that, when executed by the at leastone computer hardware processor, cause the at least one computerhardware processor to apply the ML model to the input data generatedusing the first software module to produce the corresponding ML modeloutput.

In some embodiments, the at least one computer hardware processor isconfigured to execute a virtualized application software engine, thevirtualized application software engine is configured to executemultiple different virtualized ML application programs corresponding todifferent ML tasks, and the method further comprises executing, with thevirtualized application software engine, the multiple differentvirtualized ML application programs corresponding to the different MLtasks.

In some embodiments, the virtualized ML application program comprises avirtual machine configured to execute an ML application programcomprising the plurality of software modules.

In some embodiments, the virtualized ML application program comprises acontainerized application program comprising the plurality of softwaremodules.

In some embodiments, the ML model comprises a linear regression model ora non-linear regression model.

In some embodiments, the ML model comprises an ML model configured tomap inputs to outputs in a finite set of outputs corresponding toclassification labels or actions.

Some embodiments provide for a method, comprising: using at least onecomputer hardware processor to perform: (A) receiving first dataassociated with a business, the first data comprising a first pluralityof values for a first plurality of attributes; (B) processing the firstdata, in accordance with a common data attribute schema that indicates asecond plurality of attributes, to generate a second plurality of valuesfor at least some of the second plurality of attributes, wherein the atleast some of the second plurality of attributes include a first groupof attributes, and wherein the second plurality of values includes afirst group of attribute values for the first group of attributes; (C)identifying, using the common data attribute schema and from among aplurality of pre-existing software codes, first software codeimplementing a first ML data processing pipeline configured to generatea first group of feature values, for a respective first group offeatures, from the first group of attribute values; (D) processing thefirst group of attribute values with the first software code to obtainthe first group of feature values; and (E) either: (i) providing thefirst group of feature values as inputs to a machine learning (ML) modelfor generating corresponding ML model outputs, or (ii) using the firstgroup of feature values to train the ML model.

Some embodiments provide for a system, comprising: at least one computerhardware processor; and at least one non-transitory computer-readablestorage medium storing processor executable instructions that, whenexecuted by the at least one computer hardware processor, cause the atleast one computer hardware processor to perform a method, comprising:(A) receiving first data associated with a business, the first datacomprising a first plurality of values for a first plurality ofattributes; (B) processing the first data, in accordance with a commondata attribute schema that indicates a second plurality of attributes,to generate a second plurality of values for at least some of the secondplurality of attributes, wherein the at least some of the secondplurality of attributes include a first group of attributes, and whereinthe second plurality of values includes a first group of attributevalues for the first group of attributes; (C) identifying, using thecommon data attribute schema and from among a plurality of pre-existingsoftware codes, first software code implementing a first ML dataprocessing pipeline configured to generate a first group of featurevalues, for a respective first group of features, from the first groupof attribute values; (D) processing the first group of attribute valueswith the first software code to obtain the first group of featurevalues; and (E) either: (i) providing the first group of feature valuesas inputs to a machine learning (ML) model for generating correspondingML model outputs, or (ii) using the first group of feature values totrain the ML model.

Some embodiments provide for at least one non-transitorycomputer-readable storage medium storing processor executableinstructions that, when executed by at least one computer hardwareprocessor, cause the at least one computer hardware processor to performa method, comprising: (A) receiving first data associated with abusiness, the first data comprising a first plurality of values for afirst plurality of attributes; (B) processing the first data, inaccordance with a common data attribute schema that indicates a secondplurality of attributes, to generate a second plurality of values for atleast some of the second plurality of attributes, wherein the at leastsome of the second plurality of attributes include a first group ofattributes, and wherein the second plurality of values includes a firstgroup of attribute values for the first group of attributes; (C)identifying, using the common data attribute schema and from among aplurality of pre-existing software codes, first software codeimplementing a first ML data processing pipeline configured to generatea first group of feature values, for a respective first group offeatures, from the first group of attribute values; (D) processing thefirst group of attribute values with the first software code to obtainthe first group of feature values; and (E) either: (i) providing thefirst group of feature values as inputs to a machine learning (ML) modelfor generating corresponding ML model outputs, or (ii) using the firstgroup of feature values to train the ML model.

In some embodiments, the at least some of the second plurality ofattributes include a second group of attributes different from the firstgroup of attributes, and the second plurality of values includes asecond group of attribute values for the second group of attributes, act(C) further comprises: identifying, using the common data attributeschema and from the plurality of pre-existing software codes, secondsoftware code implementing a second ML data processing pipeline,different from the first ML data processing pipeline, configured togenerate a second group of feature values, for a respective second groupof features, from the second group of attribute values, act (D) furthercomprises: processing the second group of attribute values with thesecond software code to obtain the second group of feature values, andact (E) further comprises: either: (i) providing the second group offeature values as inputs to the ML model for generating thecorresponding ML model outputs, or (ii) using the second group offeature values to train the ML model.

In some embodiments, acts (A)-(E) are performed by a virtualized MLapplication program executing using the at least one processor.

In some embodiments, the common data attribute schema indicates whichattributes in the second plurality of attributes are mandatory oroptional.

In some embodiments, processing the first data comprises: accessingvalues for those attributes, among the first plurality of attributes,that are indicated as being mandatory by the common data attributeschema; and generating an error notification when the first data doesnot include values for at least one of the attributes indicated as beingmandatory by the common data attribute schema.

In some embodiments, the common data attribute schema indicates a formatfor the second plurality of values, and processing the first data inaccordance with the common data attribute schema comprises formattingthe accessed values according to the format indicated by the common dataattribute schema.

In some embodiments, the common data attribute schema categorizesattributes in the second plurality of attributes into multiplecategories, the multiple categories including: a common attributecategory; a market segment attribute category; and a business specificattribute category.

In some embodiments, the method further comprises updating the commondata attribute schema to include one or more attributes part of thefirst plurality of attributes, but not part of the second plurality ofattributes.

In some embodiments, acts (C) and (D) are performed automatically basedon information in the common data attribute schema.

In some embodiments, the first software code implementing the first MLdata processing pipeline is configured to, when executed, generate thefirst group of feature values from the first group of attribute valuesusing first data cleansing procedures and first feature extractionprocedures.

In some embodiments, the second software code implementing the second MLdata processing pipeline is configured to, when executed, generate thesecond group of feature values from the second group of attribute valuesusing second data cleansing procedures different from the first datacleansing procedures and second feature extraction procedures differentfrom the first feature extraction procedures.

It should be appreciated that all combinations of the foregoing conceptsand additional concepts described in greater detail below (provided suchconcepts are not mutually inconsistent) are contemplated as being partof the inventive subject matter disclosed herein. In particular, allcombinations of claimed subject matter appearing at the end of thisdisclosure are contemplated as being part of the inventive subjectmatter disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

Various non-limiting embodiments of the technology will be describedwith reference to the following figures. It should be appreciated thatthe figures are not necessarily drawn to scale.

FIG. 1 is a block diagram showing a conventional approach to generatingand deploying machine learning (ML) models;

FIGS. 2A-2F are block diagrams illustrating techniques for generatingand deploying an ML model in a virtualized ML application program, inaccordance with some embodiments of the technology described herein;

FIG. 3 is a block diagram illustrating a common data attribute schemaused for generating and deploying an ML model in a virtualized MLapplication program, in accordance with some embodiments of thetechnology described herein;

FIG. 4 is a block diagram illustrating another common data attributeschema, in accordance with some embodiments of the technology describedherein;

FIGS. 5A-5B are block diagrams illustrating embodiments of a virtualizedML application program, in accordance with some embodiments of thetechnology described herein;

FIG. 6A is a block diagram illustrating data cleansing procedures for avirtualized ML application program, in accordance with some embodimentsof the technology described herein;

FIG. 6B is a block diagram illustrating feature extraction proceduresfor a virtualized ML application program, in accordance with someembodiments of the technology described herein;

FIG. 7 is a flowchart of an illustrative process for generating anddeploying ML applications for multiple businesses, in accordance withsome embodiments of the technology described herein;

FIGS. 8A-8C are block diagrams illustrating changes made to a commondata attribute schema based on business data received from differentbusinesses, in accordance with some embodiments of the technologydescribed herein;

FIG. 9 is a flowchart of an illustrative process for processing databased on a common data attribute schema for using or training an MLmodel, in accordance with some embodiments of the technology describedherein;

FIG. 10 is a flowchart of an illustrative process for using avirtualized ML application program associated with one or more ML tasks,in accordance with some embodiments of the technology described herein;

FIG. 11 is another block diagram illustrating techniques for generatingand deploying an ML model in a virtualized ML application program, inaccordance with some embodiments of the technology described herein; and

FIG. 12 schematically illustrates components of a computer that may beused to implement some embodiments of the technology described herein.

DETAILED DESCRIPTION

Aspects of the technology described herein relate to improvements inmachine learning (ML) technology. In particular, the inventors havedeveloped improved systems and method for generating and deploying MLapplications.

Enterprises, businesses, and individuals apply ML techniques tocollected data and use the resulting predictions in a wide variety ofapplications including for improving processes and for aiding decisionmaking. For example, a business may apply ML techniques to datacollected for multiple customers and create an ML model to determinewhether to send a promotional message to one or more of the business'scustomers. Conventionally, a business can hire a data analyst (or a dataanalysis firm), give them some data pertaining to the business'scustomers, and ask the data analyst to manually process the data andcreate an ML model for a problem the business is trying to address, suchas a decision whether to send a promotional message to one or morecustomers. The data analyst manually reviews and processes thebusiness's data, and creates an ML model for making the decision ofwhether to send the promotional message. While such an approach can bevery customized, it is time consuming, expensive, and error prone, asthe collected data needs to be manually processed (e.g., hand pickingdata attributes and creating one or more features from the dataattributes), and a custom data processing and model implementation needsto be created and deployed (e.g., on a specific platform for thebusiness) for each problem that the business wants to address using itscollected data. Further, if a new business in a same or similar marketsegment and having a similar problem were to hire the data analyst,conventionally the data analyst would apply the same manual approach allover again. The data analyst would not be able to leverage thepreviously-developed software or ML models to facilitate developing anew ML model for the similar problem.

The inventors have recognized that conventional techniques forgenerating and deploying ML models fail to leverage past data processingand ML model implementations for addressing similar problems for whichthe data analyst may have manually processed collected data and createdan ML model. For example, FIG. 1 illustrates a conventional approach 100for generating and deploying ML models. As shown in FIG. 1 , after thedata analyst has created an ML data processing pipeline 104 (e.g.,software code for data cleansing, feature extraction, and/or any othersuitable code for processing input data) for a business, the businessmay supply business data 102 to the existing ML data processing pipeline104 to train or use ML models to solve the business's problem. However,if a new business hires the data analyst and provides business data 106or the existing business provides new business data that is different(e.g., collected from a different source within the business), the dataanalyst cannot leverage the existing ML data processing pipeline 104.Instead, the data analyst (or multiple analysts, as the case may be, asshown in 108) has to expend considerable time in order to develop a newML data processing pipeline 110, which is expensive. Further, even afterexpending considerable resources, the new ML pipeline may containmultiple errors because it does not leverage any of the existingsoftware code previously developed made for addressing a similarproblem.

The inventors have recognized that conventional techniques forgenerating and deploying ML models suffer from numerous drawbacksdescribed above and can be improved. The inventors have developedtechniques to leverage pre-existing software for addressing a given MLtask. For example, the described systems and methods can be used toidentify, for a given ML task, pre-existing software code thatimplements one or more ML data processing pipelines to allow for rapidlygenerating and deploying ML models. In this way, the techniquesdeveloped by the inventors improve ML technology by leveragingpre-existing software to increase speed and reduce errors for dataprocessing and generating and deploying ML models. Additionally, thedeveloped techniques save time and cost compared to conventionaltechniques where data analysts manually review and process data in orderto create ML models.

To this end, the inventors have developed a common data attribute schemato aid in collecting data from a business and mapping at least some ofthe received data to one or more pre-existing software codesimplementing one or more ML data processing pipelines. The common dataattribute schema indicates one or more attributes for whichcorresponding software codes for ML data processing pipelines areincluded in the pre-existing software codes. Attribute values aregenerated from the received data for at least some of the attributesindicated in the common data attribute schema. Groups of one or moreattribute values are provided to corresponding software codesimplementing ML data processing pipelines to generate input data fortraining an ML model and/or using the ML model for inference. Thedeveloped techniques can process the received data in an end-to-endmanner by using the common data attribute schema to automaticallygenerate attribute values, identify pre-existing software codescorresponding to groups of attribute values, direct the groups ofattribute values to corresponding ML data processing pipelines togenerate training data for training an ML model, and generate and deploythe ML model. In this way, the techniques developed by the inventorsimprove ML technology by using the common data attribute schema toautomatically identify, from pre-existing software codes for ML dataprocessing pipelines, software code to process received data from abusiness and automatically generate and deploy an ML model based on theprocessed data.

There are various aspects of the techniques developed by the inventorsthat enable the improvements to ML technology described above. In someaspects, a method includes receiving first data associated with abusiness (e.g., a retailer or any other suitable business) andprocessing the received first data using a common data attribute scheme.The first data includes first values for first attributes (e.g.,attributes related to a customer of a retailer, such as Customer ID,Account Open Date, etc., or any other suitable attribute). The commondata attribute schema may be of any suitable type (e.g., as describedwith respect to FIG. 3 ) indicates second attributes, to generate secondvalues for at least some of the second attributes. At least some of thesecond attributes for which the second values are generated include afirst group of attributes. The second values for at least some of thesecond attributes include a first group of attribute values for thefirst group of attributes. The method further includes identifying,using the common data attribute schema and from among pre-existingsoftware codes (e.g., software codes for implementing one or more MLdata processing pipelines; also described with respect to FIG. 2D),first software code implementing a first ML data processing pipelineconfigured to generate a first group of feature values, for a respectivefirst group of features, from the first group of attribute values. Forexample, the first group of attribute values may include a value forattribute, Account Open Date, for a customer of a retailer business. Therespective first group of features may include Account Age, Account OpenYear, Account Open Month, Days To Next Renewal, or any other suitablefeature. The method further includes processing the first group ofattribute values with the first software code to obtain the first groupof feature values. The method further includes either: (i) providing thefirst group of feature values as inputs to an ML model for generatingcorresponding ML model outputs, or (ii) using the first group of featurevalues to train the ML model (e.g., to initially train the ML model orto update a trained ML model).

In some embodiments, the above-described acts are performed by avirtualized ML application program executing using at least oneprocessor. In some embodiments, a virtualized application program may beany application program configured to execute on a virtual machine(e.g., a VMWARE virtual machine, an AZURE virtual machine, or any othersuitable virtual machine). In some embodiments, a virtualizedapplication program may be a containerized application programconfigured to execute in a container (e.g., a DOCKER container, a MOBYcontainer, or any other suitable type of container or containerframework).

In some embodiments, some of the second attributes include a secondgroup of attributes different from the first group of attributes, andthe second values includes a second group of attribute values for thesecond group of attributes. The act of identifying the first softwarecode further includes identifying, using the common data attributeschema and from the pre-existing software codes, second software codeimplementing a second ML data processing pipeline, different from thefirst ML data processing pipeline, configured to generate a second groupof feature values, for a respective second group of features, from thesecond group of attribute values. The act of processing the first groupof attributes values with the first software code to obtain the firstgroup of feature values further comprises processing the second group ofattribute values with the second software code to obtain the secondgroup of feature values. The act of either providing the first group offeature values as inputs to an ML model or using the first group offeature values to train the ML model further includes: either: (i)providing the second group of feature values as inputs to the ML modelfor generating the corresponding ML model outputs, or (ii) using thesecond group of feature values to train the ML model.

In some embodiments, the common data attribute schema indicates whichattributes in the second attributes are mandatory or optional. Forexample, mandatory attributes may be required in order for the data tobe processed by an ML data processing pipeline and may require theprocessor to generate an error notification when values are missing forone or more mandatory attributes (e.g., a common data attribute schemafor an ML task for a retailer may specify Customer ID or any othersuitable attribute as a mandatory attribute; also described with respectto FIG. 4 ). In another example, optional attributes may not be requiredin order for the data to be processed by an ML data processing pipeline.The processor may ignore the optional attribute, replace a value for theoptional attribute with average value, a default value, or any othersuitable value for the optional attribute, or otherwise address themissing value for the optional attribute without generating an errornotification.

In some embodiments, processing the first data comprises accessingvalues for those attributes, among the first attributes, that areindicated as being mandatory by the common data attribute schema andgenerating an error notification when the first data does not includevalues for at least one of the attributes indicated as being mandatoryby the common data attribute schema. For example, the processor mayperform these acts as part of a data quality check in order to ensurethat values for mandatory attributes are available from the business.

In some embodiments, the common data attribute schema indicates a formatfor the second values. The act of processing the first data inaccordance with the common data attribute schema includes formatting theaccessed values according to the format indicated by the common dataattribute schema. For example, the format may include an ordering of theattributes, a file format (e.g., comma-separated values, EXCELspreadsheet, etc.), or any other suitable format.

In some embodiments, the common data attribute schema categorizesattributes in the second plurality of attributes into multiplecategories, the multiple categories including a common attributecategory (e.g., a common data attribute schema for an ML task for aretailer may specify Customer ID as a common attribute, and a commondata attribute schema for an ML task for a telecom provider may alsospecify Customer ID as a common attribute, where each of the retailerand the telecom provider can provide this attribute), a market segmentattribute category (e.g., a common data attribute schema for an ML taskfor a retailer may specify Product Purchased as a market segmentattribute provided by the retailer, while a common data attribute schemafor an ML task for a telecom provider may specify Service Subscribed asa market segment attribute provided by the telecom provider), and abusiness specific attribute category (e.g., a common data attributeschema for an ML task for a first retailer may specify Product Model asa business specific attribute provided by the first retailer, while acommon data attribute schema for an ML task for a second retailer mayspecify Product SKUID as a business specific attribute provided by thesecond retailer).

In some embodiments, the method further includes updating the commondata attribute schema to include one or more attributes part of thefirst plurality of attributes, but not part of the second plurality ofattributes. For example, the common data attribute schema may be updatedto include an attribute that was not previously present in the commondata attribute schema (e.g., a common data attribute schema for an MLtask for a first retailer may be updated to include an attribute whenbeing used for an ML task for a second retailer or another business).

In some embodiments, the acts of identifying, using the common dataattribute schema and from among a plurality of pre-existing softwarecodes, first software code implementing a first ML data processingpipeline and processing the first group of attribute values with thefirst software code are performed automatically based on information inthe common data attribute schema (e.g., without receiving additionaluser input for making this identification). For example, the informationin the common data attribute schema may aid in mapping groups ofattributes to ML data processing pipelines (e.g., as described withrespect to FIG. 2C, or any other suitable information) for automaticallymaking the identification.

In some embodiments, the first software code implementing the first MLdata processing pipeline is configured to, when executed, generate thefirst group of feature values from the first group of attribute valuesusing first data cleansing procedures (e.g., outlier detection code,data normalization code, data quality checking code, data enrichmentcode, any other suitable data cleansing procedures, or no data cleansingprocedures where no data cleansing may be required; also described withrespect to FIG. 6A) and first feature extraction procedures (e.g., codefor determining values for a first group of features, code fordetermining values for a second group of features, and so on; alsodescribed with respect to FIG. 6B).

In some embodiments, the second software code implementing the second MLdata processing pipeline is configured to, when executed, generate thesecond group of feature values from the second group of attribute valuesusing second data cleansing procedures different from the first datacleansing procedures and second feature extraction procedures differentfrom the first feature extraction procedures.

Conventional techniques for deploying ML models involve deploying atrained ML model in a containerized application program, such as aDOCKER container application. However, such conventional deploymentsinclude only the trained ML model in the containerized applicationprogram. As a result, any steps to update the ML model, to prepare datato provide as input to the ML model, or to monitor performance of the MLmodel are conventionally performed offline, outside of the containerizedapplication program rather than within it. After performing theiroffline analysis, the data analyst would re-deploy the ML model in a newcontainerized application program.

In contrast, the inventors have developed an improved approach where theML model is deployed in a virtualized application program (e.g., avirtual machine configured to execute an application program, acontainerized application program, or any other suitable virtualizedapplication program) that includes not only a trained ML model but alsoadditional software that allows for the deployed ML model to not only beused for inference, but also for training (e.g., re-training fromscratch using new data or updating at least some of the already trainedparameters using new data), monitoring performance of the ML model, andproviding information explaining the performance of the ML model.Importantly, the additional software includes including pre-existingsoftware codes implementing ML data processing pipelines for processinginput data to place in condition for using to train the ML model orprovide as input to the ML model so that that ML model can performinference. The inclusion of these additional software in the samecontainerized application as the ML model itself, allow for all suchtasks to be within the containerized application program, without anyneed for performing offline analysis or re-deploying the ML model in anew containerized application program.

The inventors have developed techniques to deploy a virtualized MLapplication program (e.g., a virtual machine configured to execute an MLapplication program, a containerized application program, or any othersuitable virtualized application program), including one or more ML dataprocessing pipelines and one or more ML models. Further, because thevirtualized ML application program includes the ML data processingpipelines, business data can be provided to the virtualized MLapplication program without need for external application of datacleansing procedures or feature extraction procedures. The virtualizedML application program can apply the ML data processing pipeline(s) toreceived data and generate input data for providing as input to the MLmodel(s) associated with a respective ML task. The virtualized MLapplication program can include multiple software modules, includingsoftware modules to apply one or more ML data processing pipelines,perform inference using one or more ML models, train one or more MLmodels, generate information explaining performance of the ML model,and/or any other suitable software modules. The developed techniques mayinvolve deploying multiple virtualized ML application programs fordifferent ML tasks (e.g., regression, classification, object detection,business problems such as customer churn probability, customer lifetimevalue, etc., or any other suitable ML tasks), where each virtualized MLapplication program may include software modules associated with aparticular ML task. In this way, the techniques developed by theinventors improve ML technology by allowing one or more ML dataprocessing pipelines and one or more ML models to be deployed in avirtualized ML application program.

There are various aspects of the techniques developed by the inventorsthat enable the improvements to ML technology described above. In someaspects, a method for using virtualized ML application programsassociated with different ML tasks (e.g., different tasks for onecustomer, different tasks for different customers, or a suitablecombination thereof) includes using a computer hardware processor, whichis configured to execute virtualized application programs (e.g., avirtual machine configured to execute an ML application program, such asa VMWARE virtual machine or any other suitable virtual machine; acontainerized application program, such as a DOCKER containerapplication or any other suitable containerized application program; orany other suitable virtualized application program), to perform loadinga virtualized ML application program including multiple softwaremodules. The multiple software modules include a first software moduleconfigured to apply one or more ML data processing pipelines to receiveddata to generate input data (e.g., a group of feature values for arespective group of features, or any other suitable input data) forproviding as input to an ML model (e.g., a linear regression model, anon-linear regression model such as neural networks, support vectormachines, etc., or any other suitable ML model) associated with arespective ML task. The method further includes using the first softwaremodule to: (a) receive data, via at least one communication network,from a source (e.g., a data store, a relational database, an objectoriented database, a flat file, Hadoop, or any other suitable source ofdata) external to the virtualized ML application program; and (b)process the received data using the one or more ML data processingpipelines to generate the input data. The multiple software modulesfurther include a second software module configured to perform inferenceusing the ML model. The method further includes using the secondsoftware module to apply the ML model to the input data generated usingthe first software module to produce corresponding ML model output.

In some embodiments, the multiple software modules include a thirdsoftware module configured to train the ML model (e.g., initially trainor update the ML model). The method further includes using the thirdsoftware module to train the ML model using the input data.

In some embodiments, the input data includes training data havingtraining inputs and corresponding ground-truth outputs. The ML modelincludes multiple parameters (e.g., hyperparameters or any othersuitable model parameters). The act of using the third software moduleto train the ML model includes estimating values of at least some of theparameters using the training data (e.g., using a gradient descentalgorithm, or any other suitable algorithm).

In some embodiments, the act of estimating values of at least some ofthe parameters is performed using one or more software libraries (e.g.,TENSORFLOW, KERAS, or any other suitable software library) part of thethird software module in the virtualized ML application program.

In some embodiments, the multiple software modules include a fourthsoftware module configured to generate information explainingperformance of the ML model. The method further includes using thefourth software module to generate information explaining performance ofthe ML model on the input data. For example, for an ML task for aretailer to predict a customer churn probability (e.g., a probabilitythat the retailer will lose the customer in three months, six months,nine months, or any other suitable time period), the generatedinformation may include input data features ordered by importance,feature sensitivity, or any other suitable information. In anotherexample, for an ML task for a telecom provider to predict a customerchurn probability, the generated information may indicate the mostimportant features to be Product Age and Number of Complaints, which mayindicate that the customer's device is too old and relatedly thecustomer has been making frequent complaints of bad reception or calldrops. In yet another example, for a telecom provider to predictcustomer life time value (e.g., how much money the business will makefrom a customer over a lifetime of the customer-business relationship),the generated information may include predictions for how long a periodof time the customer will maintain their relationship and how much thecustomer will spend on the business's services during this period oftime. The spend may be positive, indicating the business will turn aprofit over the course of the relationship, or negative, indicating thebusiness will lose money over the course of the relationship.

In some embodiments, the input data includes values for respectivefeatures, and the information explaining performance of the ML model onthe input data indicates, for at least some of the features, relativedegrees to which the at least some of the features influenced the MLmodel output (e.g., features ordered by importance may be generatedusing a framework, such as Explainable Artificial Intelligence (XAI), orany other suitable framework). Following from the above example, for acustomer churn probability for Customer A predicted to be 0.8 or 80%,the generated information may include features ordered by importance,Account Spend This Year 40%, Account Age 20%, Account Average AnnualSpend 10%, and Page View This Year 8%. Similarly, for a customer churnprobability for Customer B predicted to be 0.1 or 10%, the generatedinformation may include features ordered by importance, Page View ThisYear 30%, Peer Group Average Page View This Year 18%, Account Spend ThisYear 15%, and Account Average Annual Spend 10%.

In some embodiments, the input data includes values for respectivefeatures, and the information explaining performance of the ML model onthe input data indicates, for each of at least some of the features, asensitivity of the ML model output to changes in a value of the feature(e.g., feature sensitivity may be generated using a framework, such asExplainable Artificial Intelligence (XAI), or any other suitableframework). Following from the above example, for a customer churnprobability for Customer A predicted to be 0.8 or 80%, the generatedinformation may include feature sensitivity, Account Spend This Year−0.1 (which indicates that if the Account Spend This Year feature ishigher by 10%, the customer churn probability would be lowered by 1%).Similarly, for a customer churn probability for Customer B predicted tobe 0.1 or 10%, the generated information may include featuresensitivity, Page View This Year −0.05 (which indicates that if the PageView This Year feature is higher by 10%, the customer churn probabilitywould be lowered by 0.5%) and Peer Group Average Page View This Year0.06 (which indicates that if the Peer Group Average Page View This Yearfeature is higher by 10%, the customer churn probability would be higherby 0.6%).

In some embodiments, the ML model includes a multi-layer neural networkconfigured to detect objects in images, the input data includes an inputimage, and the information explaining performance of the ML model on theinput data includes information explaining performance of themulti-layer neural network on the input image. For example, in the caseof a deep learning based object detection task, the input may be acaptured image, and the output may be an object type for the input(e.g., car, human, cow, truck, house, or any other suitable objecttype). In this example, the information explaining performance of theneural network may include an importance of each pixel and featuresensitivity for each color channel of each pixel.

In some embodiments, the information explaining the performance of themulti-layer neural network on the input images includes informationindicating, for at least some pixels in the input image, relativedegrees to which the at least some of the pixels influenced the ML modeloutput and information indicating, for each of the at least some of thepixels, a sensitivity of the ML model output to changes in a value ofthe pixel.

In some embodiments, the received data includes attribute values formultiple attributes, the input data includes feature values for multiplefeatures, and one or more ML data processing pipelines are configured toprocess the received data by applying one or more data cleansingprocedures to at least some of the attribute values to obtain cleansedattribute values and applying one or more feature extraction proceduresto the cleansed attribute values to obtain the feature values. Theattribute value may be scalar, time-series (e.g., vector), imagery(e.g., matrices), or any other suitable value.

In some embodiments, the multiple attributes include groups ofattributes including a first group of attributes and a second group ofattributes. The attribute values include groups of attribute valuesincluding a first group of attribute values for the first group ofattributes and a second group of attribute values for the second groupof attributes. The multiple features include groups of featuresincluding a first group of features and a second group of features. Thefeature values include groups of feature values including a first groupof feature values for the first group of features and a second group offeature values for the second group of features. The ML data processingpipelines include a first ML data processing pipeline to generate thefirst group of feature values from the first group of attribute valuesusing first data cleansing procedures and first feature extractionprocedures. The ML data processing pipelines further include a second MLdata processing pipeline to generate the second group of feature valuesfrom the second group of attribute values using second data cleansingprocedures different from the first data cleansing procedures and secondfeature extraction procedures different from the first featureextraction procedures.

In some embodiments, the source external to the virtualized MLapplication program comprises a data store part of a computing systemthat does not execute the virtualized ML application program. Forexample, the computing system including the data store may be connectedto the virtualized ML application program via a communication network(e.g., the Internet, a local network, or any other suitablecommunication network) or any other suitable means of communication.

In some embodiments, the method further includes providing, via the atleast one communication network, the ML model output to the sourceexternal to the virtualized ML application program. For example, thesource may be part of a computing system for the business and mayprocess the ML model output to provide information to the business,compare ML model output to ground truth output, or make any othersuitable use of the ML model output.

In some embodiments, the first software module comprisesprocessor-executable instructions that, when executed by the computerhardware processor, cause the computer hardware processor to (a) receivethe data, via the at least one communication network, from the sourceexternal to the virtualized ML application program; and (b) process thereceived data using the one or more ML data processing pipelines togenerate the input data.

In some embodiments, the second software module comprisesprocessor-executable instructions that, when executed by the computerhardware processor, cause the computer hardware processor to apply theML model to the input data generated using the first software module toproduce the corresponding ML model output.

In some embodiments, the computer hardware processor is configured toexecute a virtualized application software engine configured to executemultiple different virtualized ML application programs corresponding todifferent ML tasks. The method further includes executing, with thevirtualized application software engine, the multiple differentvirtualized ML application programs corresponding to the different MLtasks. For example, the virtualized application software engine mayexecute multiple different virtualized ML application programscorresponding to different ML tasks for the same business, for differentbusinesses, or a suitable combination thereof.

In some embodiments, the virtualized ML application program includes avirtual machine (e.g., a VMWARE virtual machine or any other suitablevirtual machine) configured to execute an ML application programcomprising the multiple software modules.

In some embodiments, the virtualized ML application program includes acontainerized application program (e.g., a DOCKER container applicationor any other suitable containerized application program) including themultiple software modules.

In some embodiments, the ML model includes a linear regression model ora non-linear regression model (e.g., neural networks, support vectormachines, or any other suitable non-linear regression model).

In some embodiments, the ML model includes an ML model configured to mapinputs to outputs in a finite set of outputs corresponding toclassification labels (e.g., object types detected in an input image, orany other suitable classification labels) or actions (e.g., whether tosend a promotional message to a customer, or any other suitable action).

In some embodiments, a value of an attribute may be a single value ofany suitable type (e.g., integer, character, real number, Boolean,etc.). However, in some embodiments, a value of an attribute may includemultiple numbers (e.g., a time series, a vector of values, an image),multiple characters (e.g., a string or multiple strings), etc. Thus, itshould be appreciated that an attribute may have a value of any suitabletype, as aspects of the technology described herein are not limited inthis respect.

Following below are more detailed descriptions of various conceptsrelated to, and embodiments of, techniques for generating and deployingML models. It should be appreciated that various aspects describedherein may be implemented in any of numerous ways. Examples of specificimplementations are provided herein for illustrative purposes only. Inaddition, the various aspects described in the embodiments below may beused alone or in any combination and are not limited to the combinationsexplicitly described herein.

FIGS. 2A-2F are block diagrams illustrating techniques for generatingand deploying an ML model in a virtualized ML application program. FIG.2A shows block diagram 200 illustrating techniques that, among otherthings, identify, for a given ML task, pre-existing software code thatimplements one or more ML data processing pipelines to allow for rapidlygenerating and deploying ML models. This identification may be performedbased on a common data attribute schema that aids in collecting businessdata from a business and mapping the business data to one or moresoftware codes from pre-existing software codes for ML data processingpipelines.

Business data 202 may be received from a business (e.g., a retailer, atelecom provider, or any other suitable business). In some embodiments,business data 202 may be received from a source external to virtualizedapplication 220, or any other suitable data source. Business data 202may include values for one or more attributes (e.g., attributes relatedto a customer of a retailer, such as Customer ID, Account Open Date,etc., or any other suitable attribute). In some embodiments, a value ofan attribute may be a single value of any suitable type (e.g., integer,character, real number, Boolean, etc.). However, in some embodiments, avalue of an attribute may include multiple numbers (e.g., a time series,a vector of values, an image), multiple characters (e.g., a string ormultiple strings), etc. Thus, it should be appreciated that an attributemay have a value of any suitable type, as aspects of the technologydescribed herein are not limited in this respect.

Common data attribute schema 204 (e.g., as described with respect toFIG. 3 , or any other suitable common data attribute schema) mayindicate one or more attributes for which attribute values may begenerated from business data 202. Common data attribute schema 204 mayindicate attributes for which corresponding software codes for ML dataprocessing pipelines (e.g., as described with respect to FIG. 2D, or anyother suitable ML data processing pipelines) are stored in pre-existingsoftware codes 212. Extract, Transform, Load (ETL) processing code 206may be executed to process business data 202 in accordance with commondata attribute schema 204 and store the results as transformed businessdata 208. ETL processing code 206 may include software code forprocessing attribute values according to a specified format, or anyother suitable structure for storing the values. For example, for anattribute Account Open Date, ETL processing code 206 may transform acorresponding attribute value to be in MM-DD-YYYY format and store thetransformed value in transformed business data 208. Software code forETL processing code 206 may be provided by the business, a data analyst,or any other suitable entity. Identify software codes 210 may beexecuted to identify, using common data attribute schema 204 and fromamong pre-existing software codes 212, groups of attributes 218 andassociated software codes 216 for ML data processing pipelines 226(e.g., attribute group 1 and ML data processing pipeline 1, attributegroup 2 and ML data processing pipeline 2, and so on). This mappinginformation 214 may be used to provide transformed business data 208 tovirtualized application 220.

Virtualized application 220 may include a virtualized ML applicationprogram executing using at least one processor (e.g., processor 1202shown in computing system 1200 of FIG. 12 , or any other suitableprocessor). In some embodiments, a virtualized application program maybe any application program configured to execute on a virtual machine(e.g., a VMWARE virtual machine, an AZURE virtual machine, etc.). Insome embodiments, a virtualized application program may be acontainerized application program configured to execute in a containersuch as a DOCKER container, a MOBY container, etc. Further details forvirtualized ML application programs are described with respect to FIGS.5A-5B.

Virtualized application 220 may include ML data processing pipelinemodule 224 for processing transformed business data 208. In someembodiments, software code for an ML data processing pipeline mayinclude data cleansing procedures, feature extraction procedures, and/orany other code for processing data to be provided for training an MLmodel or processed with a trained ML model. ML data processing pipelinemodule 224 may include software code for ML data processing pipeline226, which may include data cleansing procedures 228 and featureextraction procedures 230. Data cleansing procedures 228 may includeoutlier detection code, data normalization code, data quality checkingcode, data enrichment code, and/or any other suitable code. Featureextraction procedures 230 may include code for determining values forone or more groups of features. ML data processing pipeline module 224may include software code for multiple ML data processing pipelines forprocessing values from corresponding groups of attributes. In someembodiments, software code for each ML data processing pipeline mayinclude data cleansing procedures and feature extraction proceduressuited to the values from the group of attributes corresponding to theML data processing pipeline. Further details for data cleansingprocedures 228 and feature extraction procedures 230 are described withrespect to FIGS. 6A and 6B, respectively.

For example, software code for ML data processing pipeline 226 mayprocess attribute values from attribute group 1 (218) and generatefeature values to be provided as input data 232 for training an ML modelor processing with a trained ML model. Input data 232 may be provided asinput to trained ML model 234 (deployed within virtualized application220) for generating corresponding ML model outputs. Additionally oralternatively, input data 232 may be provided to training module 236 inorder to train ML model 234 (e.g., to initially train ML model 234 or toupdate trained ML model 234). In some embodiments, training module 236may not be included in virtualized application 220. For example,training module 236 may be present at a computing system external tovirtualized application 220 and in communication with virtualizedapplication 220 via a communication network (e.g., the Internet, a localnetwork, or any other suitable communication network), or any othersuitable computing system.

In some embodiments, ML data processing pipeline module 224 may includea custom ML data processing pipeline, including custom data cleansingprocedures and custom feature extraction procedures. The software codefor the custom ML data processing pipeline may be added for a newattribute (or attributes) in business data 202. For example, a newattribute may be an attribute that was not present in common dataattribute schema 204 until business data 202 was provided. Common dataattribute schema 204 may be updated to include the new attribute.Identify software codes 210 may identify, using common data attributeschema 204, a custom attribute group (including the new attribute) andassociated software code for the custom ML data processing pipeline.

Virtualized application 220 may be in communication with businesscomputing environment 222, e.g., via a communication network (e.g., theInternet, a local network, or any other suitable communication network)or any other suitable means of communication. In some embodiments,business computing environment 222 may be located at the premises of thebusiness that provided business data 202, on a cloud server, or beotherwise accessible to the business. In some embodiments, virtualizedapplication 220 may provide information to business computingenvironment 222. For example, virtualized application 220 may provideoutput from trained ML model 234 and optionally provide informationexplaining the ML output. For example, for an ML task for a telecomprovider to predict a customer churn probability, the ML output mayindicate a high customer churn probability and the generated informationmay indicate the most important features to be Product Age and Number ofComplaints (e.g., which may indicate that the customer's device is tooold and relatedly the customer has been making frequent complaints ofbad reception or call drops, which may help explain the high customerchurn probability).

In some embodiments, business computing environment 222 may provideinformation to virtualized application 220. For example, businesscomputing environment 222 may provide new data to virtualizedapplication 220. The new data may provide ground truth outputs forprevious inputs on which trained ML model 234 was applied to generate MLoutputs. Additionally or alternatively, the new data may includeadditional training data from recent customers to which trained ML model234 has not yet been applied. The new data may be used to monitorperformance of trained ML model 234 and update the ML model if the modelperformance is below a specified threshold. For example, if the model'saccuracy for the ML output falls below a specified threshold,virtualized application 220 may update trained ML model 234 using priortraining data and/or the new data. Additionally or alternatively,virtualized application 220 may update trained ML model 234 on aperiodic basis (e.g., every week, every month, every two months, or anyother suitable interval). Additionally or alternatively, virtualizedapplication 220 may update trained ML model 234 when a threshold amountof new data is available (e.g., 20% of the size of the training datainitially used to generate trained ML model 234, 50% of the size of thetraining data initially used to generate trained ML model 234, or anyother suitable threshold).

FIG. 2B shows block diagram 240 illustrating further details fortraining module 236 and ML prediction explanation module 242 included invirtualized application 220. Input data 232 may be provided as input totrained ML model 234 in inference module 234 a (deployed withinvirtualized application 220) for generating corresponding ML modeloutput. The ML model output may be provided to ML prediction explanationmodule 242 to generate information explaining the ML output.

For example, for an ML task for a retailer to predict a customer churnprobability (e.g., a probability that the retailer will lose thecustomer in three months, six months, nine months, or any other suitabletime period), ML prediction explanation module 242 may generateinformation including input data features ordered by importance, featuresensitivity, or any other suitable information. In another example, foran ML task for a telecom provider to predict a customer churnprobability, ML prediction explanation module 242 may generateinformation indicating the most important features to be Product Age andNumber of Complaints, which may indicate that the customer's device istoo old and relatedly the customer has been making frequent complaintsof bad reception or call drops. In yet another example, for a telecomprovider to predict customer life time value (e.g., how much money thebusiness will make from a customer over a lifetime of thecustomer-business relationship), ML prediction explanation module 242may generate information including predictions for how long a period oftime the customer will maintain their relationship and how much thecustomer will spend on the business's services during this period oftime. The spend may be positive, indicating the business will turn aprofit over the course of the relationship, or negative, indicating thebusiness will lose money over the course of the relationship.

Additionally or alternatively, the input data may be provided totraining module 236 for training an ML model. Training module 236 mayinclude model training algorithms 236a for training the ML model (e.g.,a linear regression model, a non-linear regression model such as neuralnetworks, support vector machines, etc., or any other suitable MLmodel). Training module 236 may further include model performance andevaluation code 236b and model tuning code 236c for assessing andimproving performance of the trained ML model (e.g., to achieve modelperformance below a certain error threshold, or any other suitabletarget).

FIG. 2C shows block diagram 250 illustrating further details for when MLdata processing pipeline module 224 is applied to transformed businessdata 208 (including attribute values 208b for attributes 208 a ). Asdescribed above, identify software codes 210 may be executed toidentify, using common data attribute schema 204 and from amongpre-existing software codes 212, groups of attributes 218, 254 andassociated software codes 216, 252 for ML data processing pipelines 226,256, to generate groups of feature values 262 a , 264 a , for respectivegroups of features 262, 264, from groups of attribute values 218 a , 254a . The groups of attributes may include first group of attributes 218and second group of attributes 254. The groups of attribute values mayinclude first group of attribute values 218 a for first group ofattributes 218 and second group of attribute values 254 a for secondgroup of attributes 254. The groups of features may include first groupof features 262 and second group of features 264. The groups of featurevalues may include first group of feature values 262 a for first groupof features 262 and second group of feature values 264 a for secondgroup of features 264. The ML data processing pipelines include ML dataprocessing pipeline 226, to generate first group of feature values 262 afrom first group of attribute values 218 a using data cleansingprocedures 228 and feature extraction procedures 230, and ML dataprocessing pipeline 256 to generate second group of feature values 264 afrom second group of attribute values 254 a using data cleansingprocedures 258 different from data cleansing procedures 228 and featureextraction procedures 260 different from feature extraction procedures230.

FIG. 2D shows block diagram 270 illustrating that ML data processingpipeline module 224 may include ML data processing pipeline 272(including data cleansing procedures 274 and feature extractionprocedures 276) that is not applied to any group of attributes.Pre-existing software codes 212 may include software code 212 a for MLdata processing pipeline 1, software code 212 b for ML data processingpipeline 2, software code 212 c for ML data processing pipeline 3,and/or software code for any other suitable ML data processing pipeline.Virtualized application 220 may be deployed to include software code forat least some ML data processing pipelines that may not be used in someinstances. For example, software code 212 c for ML data processingpipeline 3 is deployed in virtualized application 220 but does not havea corresponding group of attributes in transformed business data 208.Such an implementation for virtualized application 220 may beadvantageous. For example, if business data provided at a later timewere to include attributes relevant to software code 212 c for ML dataprocessing pipeline 3, these attributes may be mapped to software code212 c and processed using ML data processing pipeline 3 without need fordeploying a new virtualized application. Such an implementation mayallow for rapid deployment of new virtualized applications without needfor assessing beforehand which software codes for ML data processingpipelines should be included in the virtualized application.

FIG. 2E shows block diagram 280 illustrating further details for whenbusiness data 202 is processed in accordance with common data attributeschema 204 to generate transformed business data 208. Business data 202may include values 284 for multiple attributes 282: X₁, X₂, . . . ,X_(N). ETL processing code 206 may process business data 202 inaccordance with common data attribute schema 204 to generate transformedbusiness data 208. Transformed business data 208 may include values 288for multiple attributes 286: Y₁, Y₂, . . . , Y_(N) (M<=N). Common dataattribute schema 204 may indicate attributes for which correspondingsoftware codes for ML data processing pipelines are stored inpre-existing software codes 212. It is appreciated that in someembodiments not all attributes X_(i) present in business data 202 may beincluded in common data attribute schema 204. Accordingly, values forattributes Y_(i) selected for transformed business data 208 may be fewerin number than values for attributes X_(i) present in business data 202.In some embodiments, common data attribute schema 204 may indicate aformat for attribute values. For example, the format may include anordering of the attributes, a file format (e.g., comma-separated values,EXCEL spreadsheet, etc.), or any other suitable format. It isappreciated that in some embodiments values for attributes Y_(i),selected for transformed business data 208 may not be in the same orderas the corresponding values for attributes X present in business data202. Instead, the attributes values may be ordered as specified bycommon data attribute schema 204.

FIG. 2F shows block diagram 290 illustrating another virtualizedapplication 292 that includes business data 202 and related dataprocessing functionality in addition to the functionality described withrespect to virtualized application 220. In some embodiments, anysuitable portion of the components shown in virtualized application 292may be included within virtualized application 292 or be present at acomputing system external to virtualized application 292. Thus, itshould be appreciated that any such variations are within the scope ofthe techniques described herein and aspects of the technology describedherein are not limited in this respect.

FIG. 3 is a block diagram illustrating common data attribute schema 300used for generating and deploying an ML model in a virtualized MLapplication program (e.g., virtualized application 220 in FIG. 2A, orany other suitable virtualized ML application program). The common dataattribute schema may be defined as a collection of one or moreattributes. For example, the common data attribute schema may be a listof attributes for data that may be collected from a business. The commondata attribute schema may indicate a format for attribute values. Forexample, the format may include an ordering of the attributes, a fileformat (e.g., comma-separated values, EXCEL spreadsheet, etc.), or anyother suitable format. The common data attribute schema may be used tomap data received from the business to one or more software codes frompre-existing software codes for ML data processing pipelines. Thereceived data may be automatically processed with the identifiedsoftware code(s) to generate input data for training an ML model and/orusing the ML model for inference.

Common data attribute schema 300 categorizes attributes into multiplecategories, such as common attribute category 302, market segmentattribute category 304, business specific attribute category 306, and/orany other suitable attribute category, such as additional attributecategory 308. For example, a common data attribute schema for an ML taskfor a retailer may specify Customer ID as a common attribute, and acommon data attribute schema for an ML task for a telecom provider mayalso specify Customer ID as a common attribute, where each of theretailer and the telecom provider can provide this attribute. In anotherexample, the common data attribute schema for an ML task for a retailermay specify Product Purchased as a market segment attribute provided bythe retailer, while the common data attribute schema for an ML task fora telecom provider may specify Service Subscribed as a market segmentattribute provided by the telecom provider. In yet another example, thecommon data attribute schema for an ML task for a first retailer mayspecify Product Model as a business specific attribute provided by thefirst retailer, while the common data attribute schema for an ML taskfor a second retailer may specify Product SKUID as a business specificattribute provided by the second retailer. In yet another example, thecommon data attribute schema for an ML task for a second retailer mayspecify a new attribute without a corresponding ML data processingpipeline as an additional attribute. After software code for thecorresponding ML data processing pipeline is provided, the new attributemay be reclassified as a common attribute, a market segment attribute,or a business specific attribute. In some embodiments, common dataattribute schema 300 may be updated to include an attribute that was notpreviously present in common data attribute schema 300 (e.g., a commondata attribute schema for an ML task for a first retailer may be updatedto include an attribute when being used for an ML task for a secondretailer or another business).

FIG. 4 is a block diagram illustrating common data attribute schema 400used for generating and deploying an ML model in a virtualized MLapplication program (e.g., virtualized application 220 in FIG. 2A, orany other suitable virtualized ML application program). Common dataattribute schema 400 categorizes attributes into multiple categories,such as common attribute category 402, market segment attribute category404, business specific attribute category 406, and/or any other suitableattribute category, such as additional attribute category 408. Further,common data attribute schema 400 indicates which attributes aremandatory or optional. In some embodiments, mandatory attributes may berequired in order for the data to be processed by an ML data processingpipeline and may require the processor to generate an error notificationwhen values are missing for one or more mandatory attributes (e.g.,common data attribute schema 400 specifies Customer ID as a commonattribute that is mandatory). For example, the processor may performthese acts as part of a data quality check in order to ensure thatvalues for mandatory attributes are available from the business. In someembodiments, optional attributes may not be required in order for thedata to be processed by an ML data processing pipeline. The processormay ignore the optional attribute, replace a value for the optionalattribute with average value, a default value, or any other suitablevalue for the optional attribute, or otherwise address the missing valuefor the optional attribute without generating an error notification(e.g., common data attribute schema 400 specifies Credit Score as acommon attribute that is optional).

FIG. 5A shows block diagram 500 illustrating a virtualized applicationplatform 506 including containerized application programs 508 fordifferent ML tasks. For example, containerized application programs 508may be configured to execute in corresponding containers, such as DOCKERcontainers, MOBY containers, etc. Virtualized application platform 506may be configured to receive data from a source 502 (e.g., a data store,a relational database, an object oriented database, a flat file, Hadoop,or any other suitable source of data) external to virtualizedapplication platform 506 via communication network 504 (e.g., theInternet, a local network, or any other suitable communication network).Virtualized application platform 506 may be configured to executevirtualized application 220 (shown in FIG. 2A) or any other suitablevirtualized ML application program. Source 502 external to virtualizedapplication platform 506 may include a data store that is part of acomputing system that does not execute containerized applicationprograms 508.

In some embodiments, each of containerized application programs 508 isabstracted at the application layer and may package together softwarecode for an application for an ML task and optional dependencies (e.g.,shared libraries, etc.). Because the containerized application programisolates the software code from its environment, the containerizedapplication program may be executed in a uniform manner across multipleenvironments despite differences between underlying host hardware 514 orhost operating system 512. Multiple containerized application programs508 may be executed as isolated processes managed by containermanagement engine 510 but running on the same host hardware 514 andsharing the same host operating system 512. In some embodiments,container management engine 510 is software that may be configured toexecute multiple containerized application programs 508 corresponding todifferent ML tasks. For example, virtualized application platform 506may be configured to execute multiple containerized application programs508 corresponding to different ML tasks for the same business, fordifferent businesses, or a suitable combination thereof.

FIG. 5B shows a block diagram 550 illustrating virtualized applicationplatform 552 including application programs 556 for different ML tasksbeing executed on corresponding virtual machines 554 (e.g., VMWAREvirtual machines, AZURE virtual machines, etc.). Virtualized applicationplatform 552 may be configured to receive data from a source 502 (e.g.,a data store, a relational database, an object oriented database, a flatfile, Hadoop, or any other suitable source of data) external tovirtualized application platform 552 via communication network 504(e.g., the Internet, a local network, or any other suitablecommunication network). Virtualized application platform 552 may beconfigured to execute virtualized application 220 (shown in FIG. 2A) orany other suitable virtualized ML application program. Source 502external to virtualized application platform 552 may include a datastore that is part of a computing system that does not execute virtualmachines 554 or application programs 556.

In some embodiments, application programs 556 execute on correspondingvirtual machines 554 and guest operating systems 558. Virtual machines554 are abstractions of physical hardware and may be used to emulatemultiple pieces of hardware on the same host hardware 514. Virtualmachine management engine 560 is software or hardware that may createand manage one or more virtual machines to run on host hardware 514.Each virtual machine may include a corresponding application program,optional dependencies (e.g., shared libraries, etc.), and its own copyof the guest operating system. Multiple application programs 556 may beexecuted within virtual machines 554 (with corresponding guest operatingsystems 558) and managed by virtual machine management engine 560 butrunning on the same host hardware 514. In some embodiments, virtualmachine management engine 560 may be configured to execute multiplevirtual machines 554 with application programs 556 corresponding todifferent ML tasks. For example, virtualized application platform 552may be configured to execute multiple virtual machines 554 withapplication programs 556 corresponding to different ML tasks for thesame business, for different businesses, or a suitable combinationthereof.

FIG. 6A is a block diagram illustrating data cleansing procedures 600for a virtualized ML application program (e.g., virtualized application220 in FIG. 2A, or any other suitable virtualized ML applicationprogram). Software code implementing an ML data processing pipeline(e.g., ML data processing pipeline 226 in FIG. 2A) may be configured to,when executed, process a group of attribute values using data cleansingprocedures, including outlier detection code 602, data normalizationcode 604, data quality checking code 606, data enrichment code 608, anyother suitable data cleansing procedures, or no data cleansingprocedures where no data cleansing may be required.

In some embodiments, outlier detection code 602 may process the group ofattribute values to identify extreme values that deviate from otherobservations on data, which may indicate a variability in a measurement,experimental errors, or a novelty. Outlier detection code 602 mayimplement one or more methods for outlier detection including z-score orextreme value analysis, probabilistic and statistical modeling, linearregression models, proximity based models, information theory models,high dimensional outlier detection methods, and/or any other suitableoutlier detection method.

In some embodiments, data normalization code 604 may process the groupof attribute values to transform the values in a way that they areeither dimensionless and/or have similar distribution. Datanormalization code 604 may implement one or more methods for datanormalization including min-max normalization, mean normalization,z-score normalization, and/or any other suitable data normalizationmethod.

In some embodiments, data quality checking code 606 may process thegroup of attribute values to determine whether a value is missing for anattribute identified as mandatory in the common data attribute schema.For example, data quality checking code 606 may generate an errornotification when the group of attribute values does not include a valuefor a mandatory attribute. This data quality check may be performed inorder to ensure that values for mandatory attributes are available fromthe business. Such missing values may impact the accuracy of the trainedML model, or worse, prevent initial training of the ML model.

In some embodiments, data enrichment code 608 may process the group ofattribute values to identify missing or incomplete values and enhanceexisting information by supplementing missing or incomplete values. Forexample, data enrichment code 608 may replace a missing value for anoptional attribute with an average value for the attribute (e.g.,missing value for a Customer Age attribute may be replaced with anaverage value), a default value (e.g., missing value for a ZIP codeattribute may be replaced with 00000 or another default value), or anyother suitable value. In another example, data enrichment code 608 mayidentify a missing value from the value for another attribute, e.g., formissing value for attribute Customer Age, data enrichment code 608 maydetermine the customer's age from a value for a Date Of Birth attributefor the customer.

FIG. 6B is a block diagram illustrating feature extraction procedures650 for a virtualized ML application program. Software code implementingan ML data processing pipeline (e.g., ML data processing pipeline 226 inFIG. 2A) may be configured to, when executed, process a group ofattribute values using feature extraction procedures (e.g., code fordetermining values for a first group of features 652, code fordetermining values for a second group of features 654, and so on) togenerate a group of feature values. For example, the group of attributevalues may include a value for attribute, Account Open Date, for acustomer of a retailer business. The group of features may includeAccount Age, Account Open Year, Account Open Month, Days To NextRenewal, or any other suitable feature. Code for determining values forthe group of features may determine these feature values based on thevalue for attribute, Account Open Date.

FIG. 7 is a flowchart of an illustrative process 700 for generating anddeploying ML applications for multiple businesses. Business 1 data (704)may be received from a business computing environment for business 1(702) (e.g., a first retailer). Common data attribute schema 1 (706)(e.g., as described with respect to FIG. 3 , or any other suitablecommon data attribute schema) may indicate one or more attributes forwhich attribute values may be generated from business 1 data (704).Common data attribute schema 1 (706) may already exist or may be createdor updated based on attributes in business 1 data (704). ETL processingcode (e.g., ETL processing code 206 in FIG. 2A, or any other suitableprocessing code) may be executed to process business 1 data (704) inaccordance with common data attribute schema 1 (706) and store theresults as business 1 transformed data (708). Virtualized application 1(710) (e.g., virtualized application 220 in FIG. 2A, or any othersuitable virtualized ML application program) may receive and processbusiness 1 transformed data (708) to generate input data for training anML model and/or using the ML model for inference. In some embodiments,software code for one or more ML data processing pipelines withinvirtualized application 1 (710) may determine that values for one ormore mandatory attributes are missing or incomplete. Virtualizedapplication 1 (710) may generate an error notification and send arequest to the business computing environment for business 1 (702) tosupply values for the mandatory attributes. In some embodiments,virtualized application 1 (710) may provide ML output and/or informationexplaining the ML output to the business computing environment forbusiness 1

At a later time, business 2 data (714) may be received from a businesscomputing environment for business 2 (712) (e.g., a second retailer).Common data attribute schema 1 (706) may indicate one or more attributesfor which attribute values may be generated from business 2 data (714).Because common data attribute schema 1 (706) already exists and the sameattributes may be available in business 2 data (714), no further time oreffort may need to be expended to update common data attribute schema 1(706). Business 2 data (714) may be automatically processed includingexecuting ETL processing code to generate business 2 transformed data(718) and providing business 2 transformed data (718) to virtualizedapplication 2 (720) to generate input data for training an ML modeland/or using the ML model for inference. In some embodiments, softwarecode for one or more ML data processing pipelines within virtualizedapplication 2 (720) may determine that values for one or more mandatoryattributes are missing or incomplete. Virtualized application 2 (720)may generate an error notification and send a request to the businesscomputing environment for business 2 (712) to supply values for themandatory attributes. In some embodiments, virtualized application 2(720) may provide ML output and/or information explaining the ML outputto the business computing environment for business 2 (712).

At a later time, business 3 data (724) may be received from a businesscomputing environment for business 3 (722) (e.g., a telecom provider).Common data attribute schema 1 (706), which indicates one or moreattributes for which attribute values may be generated from business 1data (704) or business 2 data (714), may need to be updated to includeone or more new attributes from business 3 data (724). Common dataattribute schema 2 (726) may be generated accordingly. Further, ETLprocessing code and/or software codes for ML data processing pipelinesmay need to be created or updated for processing values for the newattributes added to common data attribute schema 2 (726). The updatedETL processing code may be executed to process business 3 data (724) inaccordance with common data attribute schema 2 (726) and store theresults as business 3 transformed data (728). Virtualized application 3(730) may receive and process business 3 transformed data (728) togenerate input data for training an ML model and/or using the ML modelfor inference. In some embodiments, the updated software codes for oneor more ML data processing pipelines within virtualized application 3(730) may determine that values for one or more mandatory attributes aremissing or incomplete. Virtualized application 3 (730) may generate anerror notification and send a request to the business computingenvironment for business 3 (722) to supply values for the mandatoryattributes. In some embodiments, virtualized application 3 (730) mayprovide ML output and/or information explaining the ML output to thebusiness computing environment for business 3 (722).

FIGS. 8A-8C are block diagrams illustrating changes made to a commondata attribute schema based on business data received from differentbusinesses. FIG. 8A illustrates common data attribute schema 800 forbusiness 1 (e.g., a first retailer). Common data attribute schema 800includes common attributes 802, market segment attributes 804, businessspecific attributes 806, and additional attributes 808. ETL processingcode (e.g., ETL processing code 206 in FIG. 2A, or any other suitableprocessing code) may be executed to process business data for business 1in accordance with common data attribute schema 800.

At a later time, business data from business 2 (e.g., a second retailer)may be received. Because common data attribute schema 800 already existsand shares some attributes present in business data from business 2,common data attribute schema 800 may be updated to generate common dataattribute schema 810. FIG. 8B illustrates common data attribute schema810 generated for business 2. Both businesses are in the same marketsegment but may have different business specific attributes. Whilebusiness 1 uses Product Model as an attribute, business 2 may use adifferent attribute, Product SKUID, for capturing similar information.This change is reflected in business specific attributes 816, whilecommon attributes 812, market segment attributes 814, and additionalattributes 818 remain unchanged. After the change, ETL processing codemay be executed to process business data for business 2 in accordancewith common data attribute schema 810.

At a later time, business data from business 3 (e.g., a telecomprovider) may be received. Common data attribute schema 810 for business2 may be further updated to generate common data attribute schema 820for business 3. FIG. 8C illustrates common data attribute schema 820generated for business 3. These businesses may have differences inmarket segment attributes and business specific attributes. Whilebusiness 2 uses Product Purchased as a market segment attribute,business 3 may use a different market segment attribute, ServiceSubscribed, to capture similar information. Similarly, while business 2uses Product SKUID as a business specific attribute, business 3 may usea different business specific attribute, Service Subscription StartDate. These changes are reflected in market segment attributes 824 andbusiness specific attributes 826, while common attributes 822 andadditional attributes 828 remain unchanged.

After the change, ETL processing code may be executed to processbusiness data for business 3 in accordance with common data attributeschema 820.

FIG. 9 is a flowchart of process 900 for processing data based on acommon data attribute schema for using or training an ML model. At leastsome of the acts of process 900 may be performed by any suitablecomputing device(s) and, for example, may be performed by one or more ofprocessor(s) 1202 shown in computing system 1200 of FIG. 12 .

In act 902, process 900 receives first data associated with a business(e.g., a retailer or any other suitable business). In some embodiments,the first data may include first values for first attributes (e.g.,attributes related to a customer of a retailer, such as Customer ID,Account open date, etc., or any other suitable attribute).

After act 902, process 900 proceeds to act 904, where process 900processes the first data, in accordance with a common data attributeschema (e.g., as described with respect to FIG. 3 , or any othersuitable common data attribute schema) that indicates second attributes,to generate second values for at least some of the second attributes. Insome embodiments, at least some of the second attributes for which thesecond values are generated may include a first group of attributes. Thesecond values for at least some of the second attributes may include afirst group of attribute values for the first group of attributes. Insome embodiments, ETL processing code (e.g., ETL processing code 206 inFIG. 2A, or any other suitable code) may be executed to process thefirst data (e.g., business data 202 in FIG. 2A, or any other suitabledata) in accordance with the common data attribute schema (e.g., commondata attribute schema 204 in FIG. 2A, or any other suitable schema) togenerate the second values for at least some of the second attributes(e.g., transformed business data 208 in FIG. 2A, or any other suitabledata). It is appreciated that in some embodiments not all attributespresent in the first data may be included in the common data attributeschema. Accordingly, the second values for at least some of the secondattributes may be fewer in number than values for attributes present inthe first data. In some embodiments, common data attribute schema mayindicate a format for attribute values. For example, the format mayinclude an ordering of the attributes, a file format (e.g.,comma-separated values, EXCEL spreadsheet, etc.), or any other suitableformat. It is appreciated that in some embodiments the second values forat least some of the second attributes may not be in the same order asthe corresponding values for attributes present in the first data.Instead, the attributes values may be ordered as specified by the commondata attribute schema.

In some embodiments, the common data attribute schema may indicate whichattributes in the second attributes are mandatory or optional. Forexample, mandatory attributes may be required in order for the data tobe processed by an ML data processing pipeline and may require theprocessor to generate an error notification when values are missing forone or more mandatory attributes (e.g., a common data attribute schemafor an ML task for a retailer may specify Customer ID or any othersuitable attribute as a mandatory attribute; also described with respectto FIG. 4 ). In another example, optional attributes may not be requiredin order for the data to be processed by an ML data processing pipeline.The processor may ignore the optional attribute, replace a value for theoptional attribute with average value, a default value, or any othersuitable value for the optional attribute, or otherwise address themissing value for the optional attribute without generating an errornotification. In some embodiments, the common data attribute schemaindicates a format for the second values. The act of processing thefirst data in accordance with the common data attribute schema includesformatting the accessed values according to the format indicated by thecommon data attribute schema. For example, the format may include anordering of the attributes, a file format (e.g., comma-separated values,EXCEL spreadsheet, etc.), or any other suitable format.

After act 904, process 900 proceeds to act 906, where process 900identifies, using the common data attribute schema and from amongpre-existing software codes (e.g., software codes for implementing oneor more ML data processing pipelines; also described with respect toFIG. 2D), first software code implementing a first ML data processingpipeline configured to generate a first group of feature values, for arespective first group of features, from the first group of attributevalues. In some embodiments, the common data attribute schema mayindicate one or more attributes (or groups of attributes) for whichcorresponding software codes for ML data processing pipelines areincluded in the pre-existing software codes. Using the common dataattribute schema, the first group of attribute values may be provided tocorresponding software code implementing the first ML data processingpipeline to generate the first group of feature values.

In some embodiments, the first software code implementing the first MLdata processing pipeline may be configured to, when executed, generatethe first group of feature values from the first group of attributevalues using first data cleansing procedures (e.g., outlier detectioncode, data normalization code, data quality checking code, dataenrichment code, any other suitable data cleansing procedures, or nodata cleansing procedures where no data cleansing may be required; alsodescribed with respect to FIG. 6A) and first feature extractionprocedures (e.g., code for determining values for a first group offeatures, code for determining values for a second group of features,and so on; also described with respect to FIG. 6B).

After act 906, process 900 proceeds to act 908, where process 900processes the first group of attribute values with the first softwarecode to obtain the first group of feature values. In some embodiments,the first software code is for an ML data processing pipeline (e.g., MLdata processing pipeline 226 in FIG. 2A, or any other suitable ML dataprocessing pipeline) and may be executed to process the first group ofattribute values (which correspond to the ML data processing pipeline)to obtain the first group of feature values. For example, the firstgroup of attribute values may include a value for attribute, AccountOpen Date, for a customer of a retailer business. The respective firstgroup of features may include Account Age, Account Open Year, AccountOpen Month, Days To Next Renewal, or any other suitable feature.

In some embodiments, the acts of identifying, using the common dataattribute schema and from among a plurality of pre-existing softwarecodes, first software code implementing a first ML data processingpipeline and processing the first group of attribute values with thefirst software code may be performed automatically based on informationin the common data attribute schema (e.g., without receiving additionaluser input for making this identification). For example, the informationin the common data attribute schema may aid in mapping groups ofattributes to ML data processing pipelines (e.g., as described withrespect to FIG. 2C, or any other suitable information) for automaticallymaking the identification.

After act 908, process 900 proceeds to act 910, where process 900either: (i) provides the first group of feature values as inputs to anML model for generating corresponding ML model outputs, or (ii) uses thefirst group of feature values to train the ML model (e.g., to initiallytrain the ML model or to update a trained ML model). For example, thefirst group of feature values may be provided as input (e.g., input data232 in FIG. 2A, or any other suitable input) to a trained ML model(e.g., trained ML model 234 in FIG. 2A, or any other suitable ML model)for generating corresponding ML model outputs. In another example, thefirst group of feature values may be used as training data. The ML modelmay include multiple parameters (e.g., hyperparameters or any othersuitable model parameters). Training the ML model may include estimatingvalues of at least some of the parameters using the training data (e.g.,using a gradient descent algorithm, or any other suitable algorithm). Insome embodiments, the values of at least some of the parameters may beestimated using one or more software libraries (e.g., TENSORFLOW, KERAS,or any other suitable software library).

In some embodiments, the above-described acts may be performed by avirtualized ML application program executing using at least oneprocessor. In some embodiments, a virtualized application program may beany application program configured to execute on a virtual machine(e.g., a VMWARE virtual machine, an AZURE virtual machine, etc.). Insome embodiments, a virtualized application program may be acontainerized application program configured to execute in a containersuch as a DOCKER container, a MOBY container, etc.

It should be appreciated that process 900 is illustrative and that thereare variations. In some embodiments, one or more of the acts of process900 may be optional or be performed in a different order than shown inFIG. 9 . For example, act 904 and act 906 may be performed in adifferent order. Alternatively, act 904 and act 906 may be performed inparallel.

FIG. 10 is a flowchart of process 1000 for using virtualized MLapplication programs associated with different ML tasks (e.g., differenttasks for one customer, different tasks for different customers, or asuitable combination thereof) using a computer hardware processorconfigured to execute virtualized application programs (e.g., a virtualmachine configured to execute an ML application program, such as aVMWARE virtual machine or any other suitable virtual machine; acontainerized application program, such as a DOCKER containerapplication or any other suitable containerized application program; orany other suitable virtualized application program). At least some ofthe acts of process 1000 may be performed by any suitable computingdevice(s) and, for example, may be performed by one or more ofprocessor(s) 1202 shown in computing system 1200 of FIG. 12 .

In act 1002, process 1000 loads a virtualized ML application programincluding multiple software modules. In some embodiments, the multiplesoftware modules may include a first software module configured to applyone or more ML data processing pipelines to received data to generateinput data (e.g., a group of feature values for a respective group offeatures, or any other suitable input data) for providing as input to anML model (e.g., a linear regression model, a non-linear regression modelsuch as neural networks, support vector machines, etc., or any othersuitable ML model) associated with a respective ML task. In someembodiments, the multiple software modules may further include a secondsoftware module configured to perform inference using the ML model. Insome embodiments, the multiple software modules may include a thirdsoftware module configured to train the ML model (e.g., initially trainor update the ML model). In some embodiments, the multiple softwaremodules include a fourth software module configured to generateinformation explaining performance of the ML model.

After act 1002, process 1000 proceeds to act 1004, where process 1000uses the first software module to: (a) receive data, via at least onecommunication network, from a source external to the virtualized MLapplication program (e.g., a data store, a relational database, anobject oriented database, a flat file, Hadoop, or any other suitablesource of data); and (b) process the received data using the one or moreML data processing pipelines to generate the input data. In someembodiments, software code for an ML data processing pipeline may beexecuted to process received data including attribute values for a groupof attributes corresponding to the ML data processing pipeline. Forexample, software code for the ML data processing pipeline (e.g., MLdata processing pipeline 226 in FIG. 2A, or any other suitable ML dataprocessing pipeline) may be executed to process received data (e.g.,attribute values for a group of attributes, or any other suitable data)and generate input data (e.g., feature values for a group of features,or any other suitable data). In some embodiments, the input data may beprovided as input for training an ML model or processing with a trainedML model. In some embodiments, software code for the ML data processingpipeline may include data cleansing procedures, feature extractionprocedures, and/or any other code for processing received data togenerate input data. The data cleansing procedures may include outlierdetection code, data normalization code, data quality checking code,data enrichment code, and/or any other suitable code. The featureextraction procedures may include code for determining values for one ormore groups of features. Further details for data cleansing proceduresand feature extraction procedures are described with respect to FIGS. 6Aand 6B, respectively.

After act 1004, process 1000 proceeds to act 1006, where process 1000uses the second software module to apply the ML model to the input datagenerated using the first software module to produce corresponding MLmodel output. For example, input data (e.g., input data 232 in FIG. 2A,or any other suitable data) may be provided as input to a trained MLmodel (e.g., trained ML model 234 in FIG. 2A, or any other suitable MLmodel) for generating corresponding ML model output.

In some embodiments, after act 1006, process 1000 may end. In someembodiments, after act 1006, process 1000 may proceed to act 1008, whereprocess 1000 uses the third software module to train the ML model usingthe input data. In some embodiments, the input data may include trainingdata having training inputs and corresponding ground-truth outputs.

The ML model may include multiple parameters (e.g., hyperparameters orany other suitable model parameters). The act of using the thirdsoftware module to train the ML model may include estimating values ofat least some of the parameters using the training data (e.g., using agradient descent algorithm, or any other suitable algorithm). In someembodiments, the act of estimating values of at least some of theparameters may be performed using one or more software libraries (e.g.,TENSORFLOW, KERAS, or any other suitable software library) part of thethird software module in the virtualized ML application program.

In some embodiments, after act 1008, process 1000 may end. In someembodiments, after act 1008, process 1000 may proceed to act 1010, whereprocess 1000 uses the fourth software module to generate informationexplaining performance of the ML model on the input data. For example,for an ML task for a retailer to predict a customer churn probability(e.g., a probability that the retailer will lose the customer in threemonths, six months, nine months, or any other suitable time period), thegenerated information may include input data features ordered byimportance, feature sensitivity, or any other suitable information. Inanother example, for an ML task for a telecom provider to predict acustomer churn probability, the ML output may indicate a high customerchurn probability and the generated information may indicate the mostimportant features to be Product Age and Number of Complaints (e.g.,which may indicate that the customer's device is too old and relatedlythe customer has been making frequent complaints of bad reception orcall drops, which may help explain the high customer churn probability).

It should be appreciated that process 1000 is illustrative and thatthere are variations. In some embodiments, one or more of the acts ofprocess 1000 may be optional or be performed in a different order thanshown in FIG. 10 . For example, act 1008 and act 1010 may be optional.Additionally or alternatively, act 1006, act 1008, and act 1010 may beperformed in a different order.

FIG. 11 is a block diagram 1100 illustrating techniques for generatingand deploying an ML model in a virtualized ML application program.Business data may be received from source 1102 external to virtualizedapplication 1110, or any other suitable data source. The business datamay include values for one or more attributes (e.g., attributes relatedto a customer of a retailer, such as Customer ID, Account Open Date,etc., or any other suitable attribute). Common data attribute schema1106 (e.g., as described with respect to FIG. 3 , or any other suitablecommon data attribute schema) may indicate one or more attributes forwhich attribute values may be generated from the business data. Commondata attribute schema 1106 may further indicate, for the attributes,corresponding software codes for ML data processing pipelines (e.g., asdescribed with respect to FIG. 2D, or any other suitable ML dataprocessing pipelines) are stored in pre-existing software codes.Extract, Transform, Load processing code 1104 may be executed to processthe business data in accordance with common data attribute schema 1106.Identify software codes 1108 may be executed to identify, using commondata attribute schema 1106 and from among pre-existing software codes,groups of attributes and associated software codes for ML dataprocessing pipelines. This information may be used to provide theprocessed business data to virtualized application 1110.

The processed business data may be provided as input to virtualizedapplication 1110, which may include an ML data processing pipelinemodule for processing received data. Software code for an ML dataprocessing pipeline may include data cleansing procedures 1112, featureextraction procedures 1114, and/or any other code for processing inputdata to be provided for training an ML model or processed with a trainedML model. The input data may be provided as input to a trained ML modelin model inference module 1116 (deployed within virtualized application1110) for generating corresponding ML model output. The ML model outputmay be provided to produce insights module 1118 to generate informationexplaining the ML output. Additionally or alternatively, the input datamay be provided to training module 1120 for training the ML model.Training module 1120 may include model training algorithms 1122 fortraining an ML model (e.g., a linear regression model, a non-linearregression model such as neural networks, support vector machines, etc.,or any other suitable ML model). Training module 1120 may furtherinclude model performance and evaluation code 1124 and model tuning code1126 for assessing and improving performance of the trained ML model(e.g., to achieve model performance below a certain error threshold, orany other suitable target).

For example, for an ML task for a retailer to predict a customer churnprobability (e.g., a probability that the retailer will lose thecustomer in three months, six months, nine months, or any other suitabletime period), produce insights module 1118 may generate informationincluding input data features ordered by importance, featuresensitivity, or any other suitable information. In some embodiments, theML model includes a multi-layer neural network configured to detectobjects in images, the input data includes an input image, and produceinsights module 1118 may generate information explaining performance ofthe multi-layer neural network on the input image. For example, in thecase of a deep learning based object detection task, the input may be acaptured image, and the output may be an object type for the input(e.g., car, human, cow, truck, house, or any other suitable objecttype). In this example, the information explaining performance of theneural network may include an importance of each pixel and featuresensitivity for each color channel of each pixel. In some embodiments,the information explaining the performance of the multi-layer neuralnetwork on the input images includes information indicating, for atleast some pixels in the input image, relative degrees to which the atleast some of the pixels influenced the ML model output and informationindicating, for each of the at least some of the pixels, a sensitivityof the ML model output to changes in a value of the pixel.

Virtualized application 1110 may be in communication with a businesscomputing environment for the business, e.g., via a communicationnetwork (e.g., the Internet, a local network, or any other suitablecommunication network) or any other suitable means of communication.Virtualized application 1110 may provide information to the businesscomputing environment. For example, virtualized application 1110 mayprovide ML output from model inference module 1116 and informationexplaining the ML output from produce insights module 1118 torecommendations/services module 1142. In another example, virtualizedapplication 1110 may provide ML output from model inference module 1116to integrate outputs module 1128 to process the ML output and providerelated information via user interface 1136, reports 1138, and/or thirdparty applications 1140 deployed for the business computing environment.

Virtualized application 1110 may receive new data from collect outputsmodule 1130. The new data may provide ground truth outputs for previousinputs on which the trained ML model was applied to generate ML outputs.Additionally or alternatively, the new data may include additionaltraining data from recent customers to which the trained ML model hasnot yet been applied. The new data may be used to monitor performance ofthe trained ML model using performance monitoring module 1132. Based onmonitoring the performance, trigger model retraining module 1134 maytrigger an update for the trained ML model if the model performance isbelow a specified threshold. For example, if the model's performance forthe ML output is not below a specified error threshold, trigger modelretraining module 1134 may trigger an update for the trained ML modelusing prior training data and/or the new data. Additionally oralternatively, trigger model retraining module 1134 may trigger anupdate for the trained ML model on a periodic basis (e.g., every week,every month, every two months, or any other suitable interval).Additionally or alternatively, trigger model retraining module 1134 maytrigger an update for the trained ML model when a threshold amount ofnew data is available (e.g., 20% of the size of the training datainitially used to generate the trained ML model, 50% of the size of thetraining data initially used to generate the trained ML model, or anyother suitable threshold).

An illustrative implementation of a computing system 1200 that may beused in connection with any of the embodiments of the disclosureprovided herein is shown in FIG. 12 . For example, any of the computingdevices described above may be implemented as computing system 1200. Thecomputing system 1200 may include one or more computer hardwareprocessors 1202 and one or more articles of manufacture that comprisenon-transitory computer-readable storage media (e.g., memory 1204 andone or more non-volatile storage devices 1206). The processor(s) 1202may control writing data to and reading data from the memory 1204 andthe non-volatile storage device(s) 1206 in any suitable manner. Toperform any of the functionality described herein, the processor(s) 1202may execute one or more processor-executable instructions stored in oneor more non-transitory computer-readable storage media (e.g., the memory1204), which may serve as non-transitory computer-readable storage mediastoring processor-executable instructions for execution by theprocessor(s) 1202.

The terms “program” or “software” are used herein in a generic sense torefer to any type of computer code or set of processor-executableinstructions that may be employed to program a computer or otherprocessor to implement various aspects of embodiments as describedabove. Additionally, according to one aspect, one or more computerprograms that when executed perform methods of the disclosure providedherein need not reside on a single computer or processor but may bedistributed in a modular fashion among different computers or processorsto implement various aspects of the disclosure provided herein.

Processor-executable instructions may be in many forms, such as programmodules, executed by one or more computers or other devices. Generally,program modules include routines, programs, objects, components, datastructures, etc. that perform particular tasks or implement particularabstract data types. Typically, the functionality of the program modulesmay be combined or distributed.

Also, data structures may be stored in one or more non-transitorycomputer-readable storage media in any suitable form. For simplicity ofillustration, data structures may be shown to have fields that arerelated through location in the data structure. Such relationships maylikewise be achieved by assigning storage for the fields with locationsin a non-transitory computer-readable medium that convey relationshipbetween the fields. However, any suitable mechanism may be used toestablish relationships among information in fields of a data structure,including through the use of pointers, tags or other mechanisms thatestablish relationships among data elements.

As used herein in the specification and in the claims, the phrase “atleast one,” in reference to a list of one or more elements, should beunderstood to mean at least one element selected from any one or more ofthe elements in the list of elements, but not necessarily including atleast one of each and every element specifically listed within the listof elements and not excluding any combinations of elements in the listof elements. This definition also allows that elements may optionally bepresent other than the elements specifically identified within the listof elements to which the phrase “at least one” refers, whether relatedor unrelated to those elements specifically identified. Thus, forexample, “at least one of A and B” (or, equivalently, “at least one of Aor B,” or, equivalently “at least one of A and/or B”) can refer, in oneembodiment, to at least one, optionally including more than one, A, withno B present (and optionally including elements other than B); inanother embodiment, to at least one, optionally including more than one,B, with no A present (and optionally including elements other than A);in yet another embodiment, to at least one, optionally including morethan one, A, and at least one, optionally including more than one, B(and optionally including other elements);etc.

The phrase “and/or,” as used herein in the specification and in theclaims, should be understood to mean “either or both” of the elements soconjoined, i.e., elements that are conjunctively present in some casesand disjunctively present in other cases. Multiple elements listed with“and/or” should be construed in the same fashion, i.e., “one or more” ofthe elements so conjoined. Other elements may optionally be presentother than the elements specifically identified by the “and/or” clause,whether related or unrelated to those elements specifically identified.Thus, as a non-limiting example, a reference to “A and/or B”, when usedin conjunction with open-ended language such as “comprising” can refer,in one embodiment, to A only (optionally including elements other thanB); in another embodiment, to B only (optionally including elementsother than A); in yet another embodiment, to both A and B (optionallyincluding other elements); etc.

Use of ordinal terms such as “first,” “second,” “third,” etc., in theclaims to modify a claim element does not by itself connote anypriority, precedence, or order of one claim element over another or thetemporal order in which acts of a method are performed. Such terms areused merely as labels to distinguish one claim element having a certainname from another element having a same name (but for use of the ordinalterm). The phraseology and terminology used herein is for the purpose ofdescription and should not be regarded as limiting. The use of“including,” “comprising,” “having,” “containing,” “involving,” andvariations thereof, is meant to encompass the items listed thereafterand additional items.

Having described several embodiments of the techniques described hereinin detail, various modifications, and improvements will readily occur tothose skilled in the art. Such modifications and improvements areintended to be within the spirit and scope of the disclosure.Accordingly, the foregoing description is by way of example only, and isnot intended as limiting. The techniques are limited only as defined bythe following claims and the equivalents thereto.

1. A method, comprising: using at least one computer hardware processorto perform: (A) receiving first data associated with a business, thefirst data comprising a first plurality of values for a first pluralityof attributes; (B) processing the first data, in accordance with acommon data attribute schema that indicates a second plurality ofattributes, to generate a second plurality of values for at least someof the second plurality of attributes, wherein the at least some of thesecond plurality of attributes include a first group of attributes, andwherein the second plurality of values includes a first group ofattribute values for the first group of attributes; (C) identifying,using the common data attribute schema and from among a plurality ofpre-existing software codes, first software code implementing a first MLdata processing pipeline configured to generate a first group of featurevalues, for a respective first group of features, from the first groupof attribute values; (D) processing the first group of attribute valueswith the first software code to obtain the first group of featurevalues; and (E) either: (i) providing the first group of feature valuesas inputs to a machine learning (ML) model for generating correspondingML model outputs, or (ii) using the first group of feature values totrain the ML model.
 2. The method of claim 1, wherein the at least someof the second plurality of attributes include a second group ofattributes different from the first group of attributes, and wherein thesecond plurality of values includes a second group of attribute valuesfor the second group of attributes, wherein (C) further comprises:identifying, using the common data attribute schema and from theplurality of pre-existing software codes, second software codeimplementing a second ML data processing pipeline, different from thefirst ML data processing pipeline, configured to generate a second groupof feature values, for a respective second group of features, from thesecond group of attribute values, wherein (D) further comprises:processing the second group of attribute values with the second softwarecode to obtain the second group of feature values, and wherein (E)further comprises: either: (i) providing the second group of featurevalues as inputs to the ML model for generating the corresponding MLmodel outputs, or (ii) using the second group of feature values to trainthe ML model.
 3. The method of claim 1, wherein acts (A)-(E) areperformed by a virtualized ML application program executing using the atleast one processor.
 4. The method of claim 1, wherein the common dataattribute schema indicates which attributes in the second plurality ofattributes are mandatory or optional.
 5. The method of claim 4, whereinprocessing the first data comprises: accessing values for thoseattributes, among the first plurality of attributes, that are indicatedas being mandatory by the common data attribute schema; and generatingan error notification when the first data does not include values for atleast one of the attributes indicated as being mandatory by the commondata attribute schema.
 6. The method of claim 5, wherein the common dataattribute schema indicates a format for the second plurality of values,and wherein processing the first data in accordance with the common dataattribute schema comprises formatting the accessed values according tothe format indicated by the common data attribute schema.
 7. The methodof claim 1, wherein the common data attribute schema categorizesattributes in the second plurality of attributes into multiplecategories, the multiple categories including: a common attributecategory; a market segment attribute category; and a business specificattribute category.
 8. The method of claim 1, further comprisingupdating the common data attribute schema to include one or moreattributes part of the first plurality of attributes, but not part ofthe second plurality of attributes.
 9. The method of claim 1, whereinacts (C) and (D) are performed automatically based on information in thecommon data attribute schema.
 10. The method of claim 1, wherein thefirst software code implementing the first ML data processing pipelineis configured to, when executed, generate the first group of featurevalues from the first group of attribute values using first datacleansing procedures and first feature extraction procedures.
 11. Themethod of claim 2, wherein the second software code implementing thesecond ML data processing pipeline is configured to, when executed,generate the second group of feature values from the second group ofattribute values using second data cleansing procedures different fromthe first data cleansing procedures and second feature extractionprocedures different from the first feature extraction procedures.
 12. Asystem, comprising: at least one computer hardware processor; and atleast one non-transitory computer-readable storage medium storingprocessor executable instructions that, when executed by the at leastone computer hardware processor, cause the at least one computerhardware processor to perform a method, comprising: (A) receiving firstdata associated with a business, the first data comprising a firstplurality of values for a first plurality of attributes; (B) processingthe first data, in accordance with a common data attribute schema thatindicates a second plurality of attributes, to generate a secondplurality of values for at least some of the second plurality ofattributes, wherein the at least some of the second plurality ofattributes include a first group of attributes, and wherein the secondplurality of values includes a first group of attribute values for thefirst group of attributes; (C) identifying, using the common dataattribute schema and from among a plurality of pre-existing softwarecodes, first software code implementing a first ML data processingpipeline configured to generate a first group of feature values, for arespective first group of features, from the first group of attributevalues; (D) processing the first group of attribute values with thefirst software code to obtain the first group of feature values; and (E)either: (i) providing the first group of feature values as inputs to amachine learning (ML) model for generating corresponding ML modeloutputs, or (ii) using the first group of feature values to train the MLmodel.
 13. The system of claim 12, wherein the at least some of thesecond plurality of attributes include a second group of attributesdifferent from the first group of attributes, and wherein the secondplurality of values includes a second group of attribute values for thesecond group of attributes, wherein (C) further comprises: identifying,using the common data attribute schema and from the plurality ofpre-existing software codes, second software code implementing a secondML data processing pipeline, different from the first ML data processingpipeline, configured to generate a second group of feature values, for arespective second group of features, from the second group of attributevalues, wherein (D) further comprises: processing the second group ofattribute values with the second software code to obtain the secondgroup of feature values, and wherein (E) further comprises: either: (i)providing the second group of feature values as inputs to the ML modelfor generating the corresponding ML model outputs, or (ii) using thesecond group of feature values to train the ML model.
 14. The system ofclaim 12, wherein acts (A)-(E) are performed by a virtualized MLapplication program executing using the at least one processor.
 15. Thesystem of claim 12, wherein the common data attribute schema indicateswhich attributes in the second plurality of attributes are mandatory oroptional.
 16. At least one non-transitory computer-readable storagemedium storing processor executable instructions that, when executed byat least one computer hardware processor, cause the at least onecomputer hardware processor to perform a method, comprising (A)receiving first data associated with a business, the first datacomprising a first plurality of values for a first plurality ofattributes; (B) processing the first data, in accordance with a commondata attribute schema that indicates a second plurality of attributes,to generate a second plurality of values for at least some of the secondplurality of attributes, wherein the at least some of the secondplurality of attributes include a first group of attributes, and whereinthe second plurality of values includes a first group of attributevalues for the first group of attributes; (C) identifying, using thecommon data attribute schema and from among a plurality of pre-existingsoftware codes, first software code implementing a first ML dataprocessing pipeline configured to generate a first group of featurevalues, for a respective first group of features, from the first groupof attribute values; (D) processing the first group of attribute valueswith the first software code to obtain the first group of featurevalues; and (E) either: (i) providing the first group of feature valuesas inputs to a machine learning (ML) model for generating correspondingML model outputs, or (ii) using the first group of feature values totrain the ML model.
 17. The at least one non-transitorycomputer-readable storage medium of claim 16, wherein the at least someof the second plurality of attributes include a second group ofattributes different from the first group of attributes, and wherein thesecond plurality of values includes a second group of attribute valuesfor the second group of attributes, wherein (C) further comprises:identifying, using the common data attribute schema and from theplurality of pre-existing software codes, second software codeimplementing a second ML data processing pipeline, different from thefirst ML data processing pipeline, configured to generate a second groupof feature values, for a respective second group of features, from thesecond group of attribute values, wherein (D) further comprises:processing the second group of attribute values with the second softwarecode to obtain the second group of feature values, and wherein (E)further comprises: either: (i) providing the second group of featurevalues as inputs to the ML model for generating the corresponding MLmodel outputs, or (ii) using the second group of feature values to trainthe ML model.
 18. The at least one non-transitory computer-readablestorage medium of claim 16, wherein acts (A)-(E) are performed by avirtualized ML application program executing using the at least oneprocessor.
 19. The at least one non-transitory computer-readable storagemedium of claim 16, wherein the common data attribute schema indicateswhich attributes in the second plurality of attributes are mandatory oroptional.
 20. The at least one non-transitory computer-readable storagemedium of claim 19, wherein processing the first data comprises:accessing values for those attributes, among the first plurality ofattributes, that are indicated as being mandatory by the common dataattribute schema; and generating an error notification when the firstdata does not include values for at least one of the attributesindicated as being mandatory by the common data attribute schema.